Logistic regression learning algorithm example using TensorFlow library.

Logistic regression model is one of the simplest classification models. The most basic form deals with classifing a given set of data points into two possible classes, usually labelled as 0 and 1. The logistic regression model thus predicts an output y in {0,1}, given an input vector x. The probability is modeled using the logistic function $$ g(z)=1/(1+e^{-z})$$Namely, the probability of finding the output y=1 is given by $$ q_{{y=1}}\ =\ {\hat {y}}\ \equiv \ g({\mathbf {w}}\cdot {\mathbf {x}} + b)\,,$$ while the probability of finding y=0 is given by $$ q_{{y=0}} = 1 - q_{{y=1}}$$

Weights w are usually learned in the training step by using some optimization algorithem like gradient descent.

The typical loss function that one uses in logistic regression is computed by taking the average of all cross-entropies in the sample. For example, suppose we have N samples the loss function is then given by: $$L(w)\frac{1}{N}\sum_{n=1}^{N}H(p_{n},q_{n})=-{\frac 1N}\sum_{{n=1}}^{N}\ {\bigg [}y_{n}\log {\hat y}_{n}+(1-y_{n})\log(1-{\hat y}_{n}){\bigg ]}$$

In this example we will use MNIST database of handwritten digits provided in the tensorflow package. The corresponding labels in MNIST are numbers between 0 and 9, describing which digit a given image is. In order to deal with this problem we are going to use label representation of "one-hot vectors". A one-hot vector representation is a vector which is 0 in most dimensions, and 1 in a single dimension. In this case, the nth digit will be represented as a vector which is 1 in the nth dimensions. For example, 3 would be [0,0,0,1,0,0,0,0,0,0].

In the case of multiclass the output is given by: $$ \hat{y} = softmax(g(w⋅x + b))$$ which can be simplified by: $$ \hat{y} = softmax(w⋅x + b)$$ and the loss is defined as: $$ L(w) = \frac{1}{N}\sum_{n=1}^{N}H(p_{n},q_{n})=-\frac{1}{N}\sum_{n=1}^{N}y_{n}log(\hat{y}_{n})$$


In [1]:
# Import MINST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("tmp/data/", one_hot=True)


Extracting tmp/data/train-images-idx3-ubyte.gz
Extracting tmp/data/train-labels-idx1-ubyte.gz
Extracting tmp/data/t10k-images-idx3-ubyte.gz
Extracting tmp/data/t10k-labels-idx1-ubyte.gz

In [7]:
#import tensorflow
import tensorflow as tf
import numpy as np

# tf Graph Input
X = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes

# Create model
# Set model weights
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

# Construct model
y_pred = tf.nn.softmax(tf.add(tf.matmul(X, W),b)) # Softmax

In [8]:
# Define Training Parameters
learning_rate = 0.01
training_epochs = 25
batch_size = 100
display_step = 1

In [9]:
# Minimize error using cross entropy
# Cross entropy
cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(y_pred), reduction_indices=1))

# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)

In [13]:
# Initializing the variables
init = tf.initialize_all_variables()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch in range(training_epochs):
        avg_cost = 0.
        total_batch = int(mnist.train.num_examples/batch_size)
        # Loop over all batches
        for i in range(total_batch):
            batch_xs, batch_ys = mnist.train.next_batch(batch_size)
            # Fit training using batch data
            sess.run([optimizer, cost ]  feed_dict={X: batch_xs, y: batch_ys})
            # Compute average loss
            avg_cost += sess.run(cost, feed_dict={X: batch_xs, y: batch_ys})/total_batch
        # Display logs per epoch step
        if epoch % display_step == 0:
            print "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost)

    print "Optimization Finished!"

    # Test model
    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.argmax(y, 1))
    # Calculate accuracy
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
    print "Accuracy:", accuracy.eval({X: mnist.test.images, y: mnist.test.labels})


Epoch: 0001 cost= 651.181941569
Epoch: 0002 cost= 366.013635308
Epoch: 0003 cost= 304.101509273
Epoch: 0004 cost= 274.323226869
Epoch: 0005 cost= 256.067038000
Epoch: 0006 cost= 243.445543528
Epoch: 0007 cost= 234.052417025
Epoch: 0008 cost= 226.725508586
Epoch: 0009 cost= 220.797414109
Epoch: 0010 cost= 215.790569171
Epoch: 0011 cost= 211.618999228
Epoch: 0012 cost= 208.022925720
Epoch: 0013 cost= 204.804209471
Epoch: 0014 cost= 202.015058577
Epoch: 0015 cost= 199.529044494
Epoch: 0016 cost= 197.226270527
Epoch: 0017 cost= 195.181027442
Epoch: 0018 cost= 193.292408586
Epoch: 0019 cost= 191.592877552
Epoch: 0020 cost= 189.988113314
Epoch: 0021 cost= 188.518636912
Epoch: 0022 cost= 187.155902281
Epoch: 0023 cost= 185.873152584
Epoch: 0024 cost= 184.656975031
Epoch: 0025 cost= 183.516896144
Optimization Finished!
Accuracy: 0.914

In [ ]: